Untitled

Dr. Lucy D’Agostino McGowan

Interpretation for Linear Regression

What Does SSE Tell Us?

  • Measures total unexplained variation
  • Smaller SSE = better fit
  • Units are (units of y)²

SSE alone isn’t enough

  • Depends on sample size and scale of y
  • Need context for interpretation

From SSE to More Useful Metrics

Mean Squared Error (MSE): \[\text{MSE} = \frac{\text{SSE}}{n-p} = \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{n-p}\]

Why \((n-p)\) instead of \(n\)?

  • \(p\) = number of parameters estimated
  • Corrects for degrees of freedom used (provides unbiased estimate of error variance)

Interpreting Coefficients

Simple Linear Regression: \(y = \beta_0 + \beta_1 x + \varepsilon\)

  • \(\beta_0\) (intercept): Expected value of \(y\) when \(x = 0\)
  • \(\beta_1\) (slope): Expected change in \(y\) for a 1-unit increase in \(x\)

Interpreting Coefficients

Multiple Regression: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \varepsilon\)

  • \(\beta_1\): Expected change in \(y\) for a 1-unit increase in \(x_1\), holding \(x_2\) constant
  • \(\beta_2\): Expected change in \(y\) for a 1-unit increase in \(x_2\), holding \(x_1\) constant

You Try: Coefficient Interpretation

Given the regression: \(\text{Salary} = 40000 + 2000 \times \text{YearsExperience} + 5000 \times \text{HasDegree}\)

Where HasDegree = 1 if person has degree, 0 otherwise

Interpret each coefficient:

  1. What does 40000 represent?
  2. What does 2000 represent?
  3. What does 5000 represent?
04:00

Practical Exercise

  • Log in to RStudio Pro.
  • Using the teengamb dataset from the faraway package,
  • Predict gambling expenditure using sex, status, and income
  • Do this 3 ways:
  1. using the derivation from our derivative of the SSE
  2. using QR decomposition
  3. using the lm function